Open
Conversation
Contributor
1a454b0 to
dcb75d2
Compare
Contributor
Author
|
/ok to test |
1 similar comment
Contributor
Author
|
/ok to test |
Contributor
🟩 CI finished in 23m 04s: Pass: 100%/54 | Total: 4h 36m | Avg: 5m 06s | Max: 17m 44s | Hits: 89%/224
|
| Project | |
|---|---|
| CCCL Infrastructure | |
| libcu++ | |
| CUB | |
| Thrust | |
| +/- | CUDA Experimental |
| python | |
| CCCL C Parallel Library | |
| Catch2Helper |
Modifications in project or dependencies?
| Project | |
|---|---|
| CCCL Infrastructure | |
| libcu++ | |
| CUB | |
| Thrust | |
| +/- | CUDA Experimental |
| python | |
| CCCL C Parallel Library | |
| Catch2Helper |
🏃 Runner counts (total jobs: 54)
| # | Runner |
|---|---|
| 43 | linux-amd64-cpu16 |
| 5 | linux-amd64-gpu-v100-latest-1 |
| 4 | linux-arm64-cpu16 |
| 2 | windows-amd64-cpu16 |
Contributor
Author
|
/ok to test |
Contributor
🟩 CI finished in 1h 12m: Pass: 100%/54 | Total: 4h 28m | Avg: 4m 58s | Max: 23m 23s | Hits: 89%/224
|
| Project | |
|---|---|
| CCCL Infrastructure | |
| libcu++ | |
| CUB | |
| Thrust | |
| +/- | CUDA Experimental |
| python | |
| CCCL C Parallel Library | |
| Catch2Helper |
Modifications in project or dependencies?
| Project | |
|---|---|
| CCCL Infrastructure | |
| libcu++ | |
| CUB | |
| Thrust | |
| +/- | CUDA Experimental |
| python | |
| CCCL C Parallel Library | |
| Catch2Helper |
🏃 Runner counts (total jobs: 54)
| # | Runner |
|---|---|
| 43 | linux-amd64-cpu16 |
| 5 | linux-amd64-gpu-v100-latest-1 |
| 4 | linux-arm64-cpu16 |
| 2 | windows-amd64-cpu16 |
miscco
reviewed
Nov 4, 2024
cudax/include/cuda/experimental/__stf/utility/stackable_ctx.cuh
Outdated
Show resolved
Hide resolved
cudax/include/cuda/experimental/__stf/utility/stackable_ctx.cuh
Outdated
Show resolved
Hide resolved
cudax/include/cuda/experimental/__stf/utility/stackable_ctx.cuh
Outdated
Show resolved
Hide resolved
ff0ba38 to
fb3de98
Compare
caugonnet
commented
Jan 14, 2025
Contributor
Author
|
/ok to test |
Contributor
🟩 CI finished in 40m 53s: Pass: 100%/20 | Total: 3h 17m | Avg: 9m 53s | Max: 24m 35s | Hits: 582%/312
|
| Project | |
|---|---|
| CCCL Infrastructure | |
| libcu++ | |
| CUB | |
| Thrust | |
| +/- | CUDA Experimental |
| python | |
| CCCL C Parallel Library | |
| Catch2Helper |
Modifications in project or dependencies?
| Project | |
|---|---|
| CCCL Infrastructure | |
| libcu++ | |
| CUB | |
| Thrust | |
| +/- | CUDA Experimental |
| python | |
| CCCL C Parallel Library | |
| Catch2Helper |
🏃 Runner counts (total jobs: 20)
| # | Runner |
|---|---|
| 12 | linux-amd64-cpu16 |
| 4 | linux-arm64-cpu16 |
| 2 | windows-amd64-cpu16 |
| 2 | linux-amd64-gpu-v100-latest-1 |
Contributor
Author
|
/ok to test |
Contributor
🟩 CI finished in 42m 04s: Pass: 100%/20 | Total: 4h 07m | Avg: 12m 22s | Max: 22m 12s | Hits: 582%/312
|
| Project | |
|---|---|
| CCCL Infrastructure | |
| libcu++ | |
| CUB | |
| Thrust | |
| +/- | CUDA Experimental |
| python | |
| CCCL C Parallel Library | |
| Catch2Helper |
Modifications in project or dependencies?
| Project | |
|---|---|
| CCCL Infrastructure | |
| libcu++ | |
| CUB | |
| Thrust | |
| +/- | CUDA Experimental |
| python | |
| CCCL C Parallel Library | |
| Catch2Helper |
🏃 Runner counts (total jobs: 20)
| # | Runner |
|---|---|
| 12 | linux-amd64-cpu16 |
| 4 | linux-arm64-cpu16 |
| 2 | windows-amd64-cpu16 |
| 2 | linux-amd64-gpu-v100-latest-1 |
andralex
reviewed
Jan 16, 2025
andralex
reviewed
Jan 16, 2025
andralex
reviewed
Jan 16, 2025
cudax/include/cuda/experimental/__stf/utility/stackable_ctx.cuh
Outdated
Show resolved
Hide resolved
1 task
caugonnet
commented
Jan 22, 2025
cudax/include/cuda/experimental/__stf/utility/stackable_ctx.cuh
Outdated
Show resolved
Hide resolved
caugonnet
commented
Jan 22, 2025
| * @brief This class defines a context that behaves as a context which can have nested subcontexts (implemented as local | ||
| * CUDA graphs) | ||
| */ | ||
| class stackable_ctx |
Contributor
Author
There was a problem hiding this comment.
We need a == operator too
caugonnet
commented
Jan 22, 2025
cudax/include/cuda/experimental/__stf/utility/stackable_ctx.cuh
Outdated
Show resolved
Hide resolved
caugonnet
commented
Jan 22, 2025
cudax/include/cuda/experimental/__stf/utility/stackable_ctx.cuh
Outdated
Show resolved
Hide resolved
caugonnet
commented
Jan 22, 2025
caugonnet
commented
Jan 22, 2025
cudax/include/cuda/experimental/__stf/internal/logical_data.cuh
Outdated
Show resolved
Hide resolved
caugonnet
commented
Jan 29, 2025
|
|
||
| ctx.pop(); | ||
| } | ||
|
|
Contributor
Author
There was a problem hiding this comment.
TODO check results.
Contributor
Author
|
/ok to test |
Contributor
🟨 CI finished in 39m 08s: Pass: 85%/20 | Total: 4h 10m | Avg: 12m 30s | Max: 17m 58s | Hits: 388%/522
|
| Project | |
|---|---|
| CCCL Infrastructure | |
| libcu++ | |
| CUB | |
| Thrust | |
| +/- | CUDA Experimental |
| python | |
| CCCL C Parallel Library | |
| Catch2Helper |
Modifications in project or dependencies?
| Project | |
|---|---|
| CCCL Infrastructure | |
| libcu++ | |
| CUB | |
| Thrust | |
| +/- | CUDA Experimental |
| python | |
| CCCL C Parallel Library | |
| Catch2Helper |
🏃 Runner counts (total jobs: 20)
| # | Runner |
|---|---|
| 12 | linux-amd64-cpu16 |
| 4 | linux-arm64-cpu16 |
| 2 | windows-amd64-cpu16 |
| 2 | linux-amd64-gpu-v100-latest-1 |
Contributor
Author
|
/ok to test 69d714b |
caugonnet
commented
Mar 20, 2026
Replay all five transformation steps on the CPU after finalize() and EXPECT each element matches, ensuring the graph_scope RAII test verifies correctness rather than just running without error. Made-with: Cursor
Both operations must abort when called inside a nested context (i.e. after push/graph_scope). The tests use the same SIGABRT handler pattern as the existing stackable error checks. Made-with: Cursor
Verifies that read-only data is auto-pushed as read in nested graph scopes and that the original host buffer is not modified after finalize, confirming write-back is skipped for read-only data. Made-with: Cursor
Computes sqrt of 1..1024 via Newton's Babylonian method, iterating until max |change| < 1e-12. Demonstrates while_graph_scope with reduce-based convergence checking in ~80 lines, as a simpler introduction than the existing Jacobi examples. Made-with: Cursor
Contributor
Author
|
/ok to test f7c3c19 |
This comment has been minimized.
This comment has been minimized.
wait() returns a value and requires a copyable scalar type; slice<int> causes an incomplete-type compilation error. Switch to scalar_view<int> which is the intended usage pattern for wait(). Made-with: Cursor
Made-with: Cursor
Move graph_scope_guard and while_graph_scope_guard out of the stackable_ctx class into standalone definitions in stackable_ctx.cuh, consistent with repeat_graph_scope_guard which was already standalone. The nested-name syntax (stackable_ctx::graph_scope_guard) is preserved via forward declarations inside the class. Factory methods are now defined out-of-line after the guard classes. Reduces stackable_ctx_impl.cuh from 1654 to 1441 lines. Made-with: Cursor
The previous commit placed guard definitions after the UNITTESTED_FILE section, causing incomplete-type errors in the inline unit tests. Move all guard definitions (graph_scope_guard, while_graph_scope_guard, repeat_graph_scope_guard) before the #ifdef UNITTESTED_FILE block. Made-with: Cursor
Contributor
Author
|
/ok to test 7f486e5 |
This comment has been minimized.
This comment has been minimized.
Contributor
Author
|
/ok to test 5c8f6f0 |
Contributor
🥳 CI Workflow Results🟩 Finished in 16m 14s: Pass: 100%/48 | Total: 4h 39m | Max: 14m 37s | Hits: 99%/26011See results here. |
This STF-specific dot file cleanup utility doesn't belong in the benchmarks directory. It demangles and simplifies CUDA STF template names in dot graph output, so it belongs alongside the STF code. Made-with: Cursor
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This introduces helper methods to improve how we nest contexts to better leverage CUDA Graphs
Checklist